Polynomial Filtering for Fast Convergence in Distributed Consensus 
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In the past few years, the problem of distributed consensus has received a lot of attention, 
particularly in the framework of ad hoc sensor networks. Most methods proposed in the liter- 
ature address the consensus averaging problem by distributed linear iterative algorithms, with 
asymptotic convergence of the consensus solution. The convergence rate of such distributed 
algorithms typically depends on the network topology and the weights given to the edges be- 
tween neighboring sensors, as described by the network matrix. In this paper, we propose to 
accelerate the convergence rate for given network matrices by the use of polynomial filtering 
algorithms. The main idea of the proposed methodology is to apply a polynomial filter on the 
network matrix that will shape its spectrum in order to increase the convergence rate. Such 
an algorithm is equivalent to periodic updates in each of the sensors by aggregating a few of 
its previous estimates. We formulate the computation of the coefficients of the optimal poly- 
nomial as a semi-definite program that can be efficiently and globally solved for both static 
Q\ ■ and dynamic network topologies. We finally provide simulation results that demonstrate the 

effectiveness of the proposed solutions in accelerating the convergence of distributed consensus 
^vq . averaging problems. 

o 

oo 

O ■ 1 Introduction 



We consider the problem of distributed consensus [1] that has become recently very interesting 
especially in the context of ad hoc sensor networks. In particular, the problem of distributed 
average consensus has attracted a lot of research efforts due to its numerous applications in diverse 
areas. A few examples include distributed estimation [2], distributed compression [3], coordination 
of networks of autonomous agents [4] and computation of averages and least-squares in a distributed 
fashion (see e.g., [5, 6, 7, 8] and references therein). 

In general the main goal of distributed consensus is to reach a global solution using only local 
computation and communication while staying robust to changes in the network topology. Given the 
initial values at the sensors, the problem of distributed averaging is to compute their average at each 
sensor using distributed linear iterations. Each distributed iteration involves local communication 
among the sensors. In particular, each sensor updates its own local estimate of the average by a 
weighted linear combination of the corresponding estimates of its neighbors. The weights that are 
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represented in a network weight matrix W typically drive the importance of the measurements of 
the different neighbors. 

One of the important characteristics of the distributed consensus algorithms is the rate of 
convergence to the asymptotic solution. In many cases, the average consensus solution can be 
reached by successive multiplications of W with the vector of initial sensor values. Furthermore, 
it has been shown in [5] that in the case of fixed network topology, the convergence rate depends 
on the second largest eigenvalue of W, \2(W). In particular, the convergence is faster when the 
value of \2(W) is small. Similar convergence results have been proposed recently in the case of 
random network topology [9, 10], where the convergence rate is governed by the expected value of 



The main research direction so far focuses on the computation of the optimal weights W that 
yield the fastest convergence rate to the consensus solution [5, 6, 7]. In this work, we diverge from 
methods that are based on successive multiplications of W, and we rather allow the sensors to use 
their previous estimates, in order to accelerate the convergence rate. This is similar in spirit to 
the works proposed [12, 13] that reach the consensus solution in a finite number of steps. They 
use respectively extrapolation methods and linear dynamical system formulation for fixed network 
topologies. In order to address more generic network topologies, we propose here to use a matrix 
polynomial p applied on the weight matrix W in order to shape its spectrum. Given the fact that 
the convergence rate is driven by \2(W), it is therefore possible to impact on the convergence rate 
by careful design of the polynomial p. In the implementation viewpoint, working with p(W) is 
equivalent to each sensor aggregating its value periodically using its own previous estimates. We 
further formulate the problem of the computation of the optimal coefficients for both static and 
dynamic network topologies. We show that this problem can be efficiently and globally solved in 
both cases by the definition of a semi-definite program (SDP). 

The rest of this paper is organized as follows. In Section 2 we review the main convergence 
results of average consensus in both fixed and dynamic random network topologies. Next, in Section 
3 we introduce the polynomial filtering methodology and discuss its implementation for distributed 
consensus problems. We compute the optimal polynomial filter in Section 4 for both static and 
dynamic network topologies. In Section 5 we provide simulation results that verify the validity and 
the effectiveness of our method. Related work is finally presented in Section 6. 

2 Convergence in Distributed Consensus Averaging 

Let us first define formally the problem of distributed consensus averaging. Assume that initially 
each sensor i reports a scalar value xq(i) £ K. We denote by xq = [xq(1), . . . ,xo(n)] T £ M. n the 
vector of initial values on the network. Denote by 



the average of the initial values of the sensors. However, one rarely has a complete view of the 
network. The problem of distributed averaging therefore becomes typically to compute fi at each 
sensor by distributed linear iterations. In what follows we review the main convergence results for 
distributed consensus algorithms on both fixed and random network topologies. 



the X 2 (W), E[X 2 (W)}. 
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2.1 Static network topology 

We model the static network topology as an undirected graph Q = (V, £ ) with nodes V = {1, . . . , n} 
corresponding to sensors. An edge £ £ is drawn if and only if sensor i can communicate with 
sensor j. We denote the set of neighbors for node i as Mi = {j\ €. £}■ Unless otherwise stated, 
we assume that each graph is simple i.e., no loops or multiple edges are allowed. 
In this work, we consider distributed linear iterations of the following form 

xt+i(i) = Wuxtii) + WijXtU), ( 2 ) 

for i = 1, . . . ,n, where xt(j) represents the value computed by sensor j at iteration t. Since the 
sensors communicate in each iteration t, we assume that they are synchronized. The parameters 
Wij denote the edge weights of Q. Since each sensor communicates only with its direct neighbors, 
Wij = when ^ £■ The above iteration can be compactly written in the following form 

x t +i = Wx t , (3) 

or more generally 

t-i 

x t = (j[w)x = W t x . (4) 

i=0 

We call the matrix W that gathers the edge weights Wij, as the weight matrix. Note that W 
is a sparse matrix whose sparsity pattern is driven by the network topology. We assume that W 
is symmetric, and we denote its eigenvalue decomposition as W = QAQ T . The (real) eigenvalues 
can further be arranged as follows: 

Xi(W) > \ 2 (W) >...> X n -i(W) > \ n (W). (5) 

The distributed linear iteration given in eq. (3) converges to the average if and only if 

llT 

lim W* = , (6) 

t— >oo n 

where 1 is the vector of ones [5]. Indeed, notice that in this case 

11 T 

x* = lim xt = lim W t x$ = xq = [x\. 

t— >oo t— >oo n 

It has been shown that for fixed network topology the convergence rate of eq. (3) depends on 
the magnitude of the second largest eigenvalue \2{W) [5]. The asymptotic convergence factor is 
defined as 

rasymiw) = sup lim — ^rrr) / 1 ( 7 ) 

and the per-step convergence factor is written as 

r stcp (W) = sup ll f +1 ~^ 1 ' 2 . (8) 

Furthermore, it has been shown that the convergence rate relates to the spectrum of W, as 
given by the following theorem [5]. 
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Theorem 1 The convergence given by eq. (6) is guaranteed if and only if 

1 T W = 1 T , (9) 
Wl = 1, (10) 



11 T 



p (W ) < i, (ii) 



n 



where p(-) denotes the spectral radius of a matrix. Furthermore, 



r asym (W)=p(W-—), (12) 



n 



r s te P (W) = \\W - —\\ 2 . (13) 



n 



According to the above theorem, -^1 is a left and right eigenvector of W associated with the 
eigenvalue one, and the magnitude of all other eigenvalues is strictly less than one. Note finally, that 
since W is symmetric, the asymptotic convergence factor coincides with the per-step convergence 
factor, which implies that the relations (12) and (13) are equivalent. 

We give now an alternate proof of the above theorem that illustrates the importance of the 
second largest eigenvalue in the convergence rate. We expand the initial state vector xo to the 
orthogonal eigenbasis Q of W; that is, 



1 

where v\ = (-3=,xo) and Vj = (qj,xo). We further assume that v\ / 0. Then, eq. (4) implies that 

n 

Xt = w t x = w t (^=i + Y J vjqj) 

n 

= ^ 1+ |> w "* 

n 



Observe now that if |Aj| < 1, Vj > 2, then in the limit, the second term in the above equation 
decays and 

x* = lim x t = —^1 = fj,l. 



We see that the smaller the value of A2(VF), the faster the convergence rate. Analogous conver- 
gence results hold in the case of dynamic network topologies discussed next. 



2.2 Dynamic network topology 

Let us consider now networks with random link failures, where the state of a link changes over 
the iterations. In particular, we use the random network model proposed in [9, 10]. We assume 
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that the network at any arbitrary iteration t is Q{t) = (V,£ (t)), where £{t) denotes the edge set at 
iteration t, or equivalently at time instant t. Since the network is dynamic, the edge set changes 
over the iterations, as links fail at random. We assume that £{t) C £*, where £* C V x V is the 
set of realizable edges when there is no link failure. 

We also assume that each link fails with a probability (1 —pij), independently of the other 
links. Two random edge sets £{t\) and £(£2) at different iterations t\ and £2 are independent. The 
probability of forming a particular £{t) is thus given by j)e£(t)Pv- We define the matrix P as 

p. . = [Py if G £* and i / j, 
)o otherwise. 

The matrix P is symmetric and its diagonal elements are zero, since it corresponds to a simple 
graph. It represents the probabilities of edge formation in the network, and the edge set £(t) is 
therefore a random subset of £* driven by the P matrix. Finally, the weight matrix W becomes 
dependent on the edge set since only the weights of existing edges can take non zero values. 
In the dynamic case, the distributed linear iteration of eq. (2) becomes 

xt+i(i) = W u (t)x t (i) + WijWxttj) (15) 

or in compact form, 

x t+ i = W t x t , (16) 

where Wt denotes the weight matrix corresponding to the graph realization Q (t) of iteration t. The 
iterative relation given by eq. (16) can be written as 



t-i 

■r, -- (n^) x o 

i=0 



Clearly, xt now represents a stochastic process since the edges are drawn randomly. The convergence 
rate to the consensus solution therefore depends on the behavior of the product Yll^o Wi- We say 
that the algorithm converges if 

Vx G M™, lim E\\x t - fil\\ = 0. (17) 

We review now some convergence results from [10], which first shows that 
Lemma 1 For any xq 6 M n , 

— 11 T 



\x t +i -fii\\ < ( [Jp(wi - —^-))\\ x o -nA 



n 

i=0 



It leads to the following convergence theorem [10] for dynamic networks. 

Theorem 2 If E[p(W - ^ ))] < 1, the vector sequence {xt}^ converges in the sense of eq. 
(17). 
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We define the convergence factor in dynamic network topologies as 



r 11 1 

r d {W) = E p(W ) 

L n ' 

This factor depends in general on the spectral properties of the induced network matrix and drives 
the convergence rate of eq. (16). The authors in [14] show that ^(^[W^])! < 1 is also a necessary 
and sufficient condition for asymptotic (almost sure) convergence of the consensus algorithm in the 
case of random networks, where both network topology and weights are random (in particular i.i.d 
and independent over time). 

Finally, it is interesting to note that the consensus problem in a random network relates to 
gossip algorithms. Distributed averaging under the synchronous gossip constraint implies that 
multiple node pairs may communicate simultaneously only if these node pairs are disjoint. In other 
words, the set of links implied by the active node pairs forms a matching of the graph. Therefore, 
the distributed averaging problem described above is closely related to the distributed synchronous 
algorithm under the gossip constraint that has been proposed in [15, Sec. 3.3.2]. It has been shown 
in this case that the averaging time (or convergence rate) of a gossip algorithm depends on the 
second largest eigenvalue of a doubly stochastic network matrix. 



3 Accelerated consensus with polynomial filtering 
3.1 Exploiting memory 

As we have seen above, the convergence rate of the distributed consensus algorithms depends in 
general on the spectral properties of an induced network matrix. This is the case for both fixed and 
random network topologies. Most of the research work has been devoted to finding weight matrix 
W for accelerating the convergence to the consensus solution when sensors only use their current 
estimates. We choose a different approach where we exploit the memory of sensors, or the values 
of previous estimates in order to augment to convergence rate. 

Therefore, we have proposed in our previous work [12] the Scalar Epsilon Algorithm (SEA) 
for accelerating the convergence rate to the consensus solution. SEA belongs to the family of 
extrapolation methods for accelerating vector sequences, such as eq. (3). These methods exploit 
the fact that the fixed point of the sequence belongs to the subspace spanned by any £+1 consecutive 
terms of it, where £ is the degree of the minimal polynomial of the sequence generator matrix (for 
more details, see [12] and references therein). SEA is a low complexity algorithm, which is ideal for 
sensor networks and it is known to reach the consensus solution in 2£ steps. However, £ is unknown 
in practice, so one may use all the available terms of the vector sequence. Hence, the memory 
requirements of SEA are 0(T), where T is the number of terms. Moreover, SEA assumes that the 
sequence generator matrix (e.g., W in the case of eq. (3)) is fixed, so that it does not adapt easily 
to dynamic network topologies. 

In this paper, we propose a more flexible algorithm based on the polynomial filtering technique. 
Polynomial filtering permits to "shape" the spectrum of a certain symmetric weight matrix, in 
order to accelerate the convergence to the consensus solution. Similarly to SEA, it allows the 
sensors to use the value of their previous estimates. However, the polynomial filtering methodology 
introduced below presents three main advantages: (i) it is robust to dynamic topologies (ii) it has 
explicit control on the convergence rate and (iii) its memory requirements can be adjusted to the 
memory constraints imposed by the sensor. 



6 



3.2 Polynomial filtering 



Starting from a given (possibly optimal) weight matrix W, we propose the application of a polyno- 
mial filter on the spectrum of W in order to impact the magnitude of \2(W) that mainly drives the 
convergence rate. Denote by p k (X) the polynomial filter of degree k that is applied on the spectrum 
of W, 

Pfc(A) = a + a 1 \ + a 2 \ 2 + ... + a k \ k . (18) 
Accordingly, the matrix polynomial is given as 

p k (W) = a I + a\W + a 2 W 2 + ... + a k W k . (19) 

Observe now that 

Pk(W) = p k (QAQ T ) 

= a I + ai (QAQ T ) + ... + a k (QA k Q T ) 

= Qp k (A)Q T , (20) 

which implies that the eigenvalues of Pk(W) are simply the polynomial filtered eigenvalues of W 
i.e., p k (Xi(W)), i = l,...,n. 

In the implementation level, working on Pk(W) implies a periodic update of the current sensor's 
value with a linear combination of its previous values. To see why this is true, we observe that: 

xt+k+i = Pk{W)x t 

= a x t + aiWx t + . . . + a k W k x t 

= a x t + aixt+i + ■ • • + a k x t+k . (21) 

A careful design of p k may impact the convergence rate dramatically. Then, each sensor typically 
applies polynomial filtering for distributed consensus by following the main steps tabulated in 
Algorithm 1. 

Note that, for both fixed and random network topology cases, the a k s are computed off-line 
assuming that W and respectively .E[W] are known a priori. In what follows, we propose different 
approaches for computing the coefficients «j of the filter p k . 



3.3 Newton's interpolating polynomial 

One simple and rather intuitive approach for the design of the polynomial p k (X) is to use Hermite 
interpolation. Recall that the objective is to dampen the smallest eigenvalues A2,...,A n of W, 
while keeping the eigenvalue one intact. Therefore, we assume that the spectrum of W lies in an 
interval [a, 1] and we impose smoothness constraints of p k at the left endpoint a. In particular, the 
polynomial p k {\) : [a, 1] — > R that we seek, will be determined by the following constraints: 



p k (a) = 0, p k \a) = 0, i = 1, . . . , k - 1 
Pfc(l) = 1, 



(22) 
(23) 



where p k \a) denotes the i-th. derivative of p k (X) evaluated at a. The dampening is achieved 
by imposing smoothness constraints of the polynomial on the left endpoint of the interval. The 
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Algorithm 1 Polynomial Filtered Distributed Consensus 



Input: polynomial coefficients chq, . . . , afc+i, tolerance e. 

Output: average estimate p,. 

Initialization: 

flo = x (i). 

Set the iteration index t = 1. 
repeat 

if mod(i, fc + 1) ==0 then 

x t (i) = aox t -k-i{i)+otiXt-k{i)+ot2Xt-k+i{i) + - ■ .+afcX t _i(£) {polynomial filtered update} 
x t (i) = Wux t {i) + EjeM w ij x t(j)- 
else 

x t (i) = Waxt-i{i) + EjeM WijXt-x{j). 
end if 

Increase the iteration index t = t + 1. 



/i 4 = X t (iJ. 

until fa-fit-i < e 




Figure 1: Newton's polynomial of various degrees A:. 



computed polynomial will have degree equal to k. Finally, the coefficients of that satisfies the 
above constraints can be computed by the Newton's divided differences table. 

Figure 1 shows the form of Pfc(A) for a = and different values of the degree k. As k increases, 
the dampening of the small eigenvalues becomes more effective. It is worth mentioning that the 
design of Newton's polynomial does not depend on the network size or the size of the weight 
matrix W. What is only needed is a left endpoint a, which encloses the spectrum of W, as well 
as the desired degree k, which moreover may be imposed by memory constraints. This feature 
of Newton's polynomial is very interesting and it is particularly appealing in the case of dynamic 
network topologies. However, the above polynomial design is mostly driven by intuitive arguments, 
which tend to obtain small eigenvalues for faster convergence. In the following section, we provide 
an alternative technique for computing the polynomial filter that optimizes the convergence rate. 
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4 Optimal polynomial filters 



4.1 Polynomial filtering for static network topologies 

We are now interested in finding the polynomial that leads to the fastest convergence of linear 
iteration described in eq. (3), for a given weight matrix W and a certain degree k. For notational 
ease, we call W p = Pk(W) = Yli=o a i^ 1 - According to Theorem 1, the optimal polynomial is the 
one that minimizes the second largest eigenvalue of W p , i.e., p(W p — ). Therefore, we need to 
solve an optimization problem where the optimization variables are the k+ 1 polynomial coefficients 
ckq, • • • , oik and the objective function is the spectral radius of W p — . 



Optimization problem: OPT1 

min QeRfc+ i p{Yli=o a i wi ~ Hr) 
subject to 

(Eto«^)i = i- 



Interestingly, the optimization problem OPT1 is convex. First, its objective function is convex, 
as stated in Lemma 2. 

Lemma 2 For a given symmetric weight matrix W and degree k, p( J2i=o &iW % — ) is a convex 
function of the polynomial coefficients oti 's. 



Proof: Let /?, 7 € R k+1 and < 6 < 1. Since W is symmetric, ^i=o a *^ ^ s a ^ so symmetric. 
Hence, the spectral radius is equal to the matrix 2-norm. Thus, we have 



k 



T 



- J>A + (1 - 9H)w* - 



11 



i=0 

k 

k 



n 



11 



< 9 



11 



i=0 

k 



n 

11 T 



i=0 

fc 



+ (1-0)11^7^ 



11' 

n 
11 T 



i=0 



n 



i=0 



i=0 



which proves the Lemma. In addition, the constraint of OPT1 is linear which implies that the set 
of feasible c^'s is convex. As OPT1 minimizes a convex function over a convex set, the optimization 
problem is indeed convex. 

In order to solve OPT1, we use an auxiliary variable s to bound the objective function, and 
then we express the spectral radius constraint as a linear matrix inequality (LMI). Thus, we need 
to solve the following optimization problem. 
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Optimization problem: 


OPT2 


mnl seIK,aeIK fe + 1 S 




subject to 






n — ' 


(Eto«^)i = i- 





Recall that since W is symmetric, W p = Yli=o a iW l will be symmetric as well. Hence, the constraint 
W p l = 1 is sufficient to ensure that 1 will be also a left eigenvector of W p . The spectral radius 
constraint, 

11 T 

si <W V < si 

n 

ensures that that all the eigenvalues of W p other than the first one, are less or equal to s. Due 
to the LMI, the above optimization problem becomes equivalent to a semi-definite program (SDP) 
[16]. SDPs are convex problems and can be globally and efficiently solved. The solution to OPT2 
is therefore computed efficiently in practice, where the SDP only has a moderate number of k + 2 
unknowns (including s). 



4.2 Polynomial filtering for dynamic network topologies 



We extend now the idea of polynomial filtering to dynamic network topologies. Theorem 2 suggests 



that the convergence rate in the random network topology case is governed by E 



p(W 



li 1 

n 



Since W depends on a dynamic edge set, Pk(W) now becomes stochastic. Following the same 
intuition as above, we could form an optimization problem, similar to OPT1, whose objective 
function would be E[p( Yli=o a iW l — )]• Although this objective function can be shown to be 
convex, its evaluation is hard and typically requires several Monte Carlo simulations steps. 

Recall that the convergence rate of eq. (16) is related to the second largest eigenvalue of i£[W], 
which is much easier to evaluate. Let W denote the average weight matrix £7[W]. We then observe 
that 

p(W-— )1 >pCw-—\ 

L n '\ \ n J 

which is due to Lemma 2 and Jensen's inequality. The above inequality implies that the spectral 

p(W— ) to be small too. Additionally, the authors 

seems to be closely related 



E 



n 



radius p^W— ) shall be small in order E 

provide experimental evidence in [10], which indicates that p[w - 

to the convergence rate of eq. (16). 

Based on the above facts, we propose to build our polynomial filter based on W. Hence, we 
formulate the following optimization problem for computing the polynomial coefficients ctj's in the 
random network topology case. 



Optimization problem: OPT3 

min QeRfc+ i p(ELo«^ - Hr) 
subject to 

(Eto«^)i = i- 
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0PT3 could be viewed as the analog of OPT1 for the case of dynamic network topology. The 
main difference is that we work on W, whose eigenvalues can be easily obtained. Using again the 
auxiliary variable s, we reach the following formulation for obtaining the a^s. 



Optimization problem: 


OPT4 


mm seR,aeIR fe + 1 s 




subject to 






n — ' 


(EL.«iW)i = i. 





Once W has been computed, this optimization problem is solved efficiently by a SDP similarly 
to the case of static networks. 



5 Simulation Results 
5.1 Setup 

In this section, we provide simulation results which show the effectiveness of the polynomial filtering 
methodology. First we introduce a few weight matrices that have been extensively used in the 
distributed averaging literature. Suppose that d{i) denotes the degree of the i-th sensor. It has 
been shown in [5, 6] that iterating eq. (3) with the following matrices leads to convergence to fil. 

• Maximum- degree weights. The Maximum-degree weight matrix is 

i£(i,j)€£, 



n 



n 

otherwise. 
Metropolis weights. The Metropolis weight matrix is 

l+max{d(i),d(j)} lf 



1-^ i = j, (24) 



^-J2 {i , k) esW ik i = j, (25) 
otherwise. 



• Laplacian weights. Suppose that A is the adjacency matrix of Q and D is a diagonal matrix 
which holds the vertex degrees. The Laplacian matrix is defined as L = D — A and the 
Laplacian weight matrix is defined as 

W = I - 7 L, (26) 
where the scalar 7 must satisfy 7 < l/d m ax [5]. 

The sensor networks are built using the random geographic graph model [19]. In particular, 
we place n nodes uniformly distributed on the 2-dimensional unit area. Two nodes are adjacent if 

their Euclidean distance is smaller than r = \J X ° g ^ in order to guarantee connectedness of the 
graph with high probability [19]. 

Finally, the SDP programs for optimizing the polynomial filters are solved in Matlab using the 
SeDuMi [18] solver 1 . 



1 Publically available at: http://sedumi.mcmaster.ca/ 
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(a) SDP polynomial filter pjb(A), A m i n < A < 1 (b) Effect on the spectrum \(W) 



Figure 2: Effect of polynomial filtering on the spectrum of the Maximum-degree matrix of eq. (24). 



5.2 Static network topologies 

We illustrate first the effect of polynomial filtering on the spectrum of W. We build a network of 
n = 50 sensors and we apply polynomial filtering on the Maximum-degree weight matrix W, given in 
(24). We use k = 4 and we solve the optimization problem OPT2 using the Maximum-degree matrix 
W as input. Figure 2(a) shows the obtained polynomial filter Pfc(A), when A G [A m i n (W), 1]. Next, 
we apply the polynomial on W and Figure 2(b) shows the spectrum of W before (star-solid line) 
and after (circle-solid line) polynomial filtering, versus the vector index. Observe that polynomial 
filtering dramatically increases the spectral gap |1 — A2I, which further leads to accelerating the 
distributed consensus, as we show in the simulations that follow. 

Then we compare the performance of the different distributed consensus algorithms, with all 
the aforementioned weight matrices; that is, Maximum-degree, Metropolis and Laplacian weight 
matrices for distributed averaging. We compare both Newton's polynomial and the SDP polynomial 
(obtained from the solution of OPT2) with the standard iterative method, which is based on 
successive iterations of eq. (3). For the sake of completeness, we also provide the results of the 
scalar epsilon algorithm (SEA) that uses all previous estimates [12]. 

First, we explore the behavior of polynomial filtering methods under variable degree k from 2 
to 6 with step 2. We use the Laplacian weight matrix for this experiment. Figures 3(a) and 3(b) 
illustrate the evolution of the absolute error \\xt — /xX 1 1 2 versus the iteration index t, for polynomial 
filtering with SDP and Newton's polynomials respectively. We also provide the curve of the standard 
iterative method as a baseline. Observe first that both polynomial filtering methods outperform the 
standard method by exhibiting faster convergence rates, across all values of k. Notice also, that the 
degree k governs the convergence rate, since larger k implies more effective filtering and therefore 
faster convergence. Finally, the stagnation of the convergence process of the SDP polynomial 
filtering and large values of k is due to the limited accuracy of the SDP solver. 

Next, we show the results obtained with the other two weight matrices on the same sensor net- 
work. Figures 4(a) and 4(b) show the convergence behavior of all methods for the Maximum-degree 
and Metropolis matrices respectively. In both polynomial filtering methods we use a representative 
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(a) SDP polynomial filtering 



(b) Newton polynomial filtering 



Figure 3: Behavior of polynomial filtering for variable degree k on fixed topology using the Laplacian 
weight matrix. 




Figure 4: Comparison between Newton's polynomial and SDP polynomial {k = 4) on fixed topology. 

value of k, namely 4. Notice again that polynomial filtering accelerates the convergence of the 
standard iterative method (solid line). As expected, the optimal polynomial computed with SDP 
outperforms Newton's polynomial, which is based on intuitive arguments only. 

Finally, we can see from Figures 3 and 4 that in some cases the convergence rate is comparable 
for SEA and SDP polynomial filtering. Note however that the former uses all previous iterates, in 
contrast to the latter that uses only the k + 1 most recent ones. Hence, the memory requirements 
are smaller for polynomial filtering, since they are directly driven by k. This moreover allows more 
direct control on the convergence rate, as we have seen in Fig. 3. Interestingly, we see that the 
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Figure 5: Random network topology simulations, q denotes the probability that the network 
changes at each iteration. 

convergence process is smoother with polynomial filtering, which further permits easy extension to 
dynamic network topologies. 

5.3 Dynamic network topologies 

We study now the performance of polynomial filtering for dynamic networks topologies. We build 
a sequence of random networks of n = 50 sensors, and we assume that in each iteration the 
network topology changes independently from the previous iterations, with probability q and with 
probability 1 — g it remains the same as in the previous iteration. We compare all methods for 
different values of the probability q. We use the Laplacian weight matrix (26). In the SDP 
polynomial filtering method, we solve the SDP program OPT4 (see Sec. 4.2). Fig. 5 shows the 
average performance of polynomial filtering for some representative values of the degree k and the 
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probability q. The average performance is computed using the median over the 100 experiments. 
We have not reported the performance of the SEA algorithm, since it is not robust to changes of 
the network topology. 

Notice that when k = 1 (i.e., each sensor uses only its current value and the right previous one) 
polynomial filtering accelerates the convergence over the standard method. At the same time, it 
stays robust to network topology changes. Also, observe that in this case, the SDP polynomial 
outperforms Newton's polynomial. However, when k = 2, the roles between the two polynomial 
filtering methods change as the probability q increases. For instance, when q = 0.8, the SDP 
method even diverges. This is expected if we think that the coefficients of Newton's polynomial are 
computed using Hermite interpolation in a given interval and they do not depend on the specific 
realization of the underlying weight matrix. Thus, they are more generic than those of the SDP 
polynomial that takes W into account, and therefore less sensitive to the actual topology realization. 
Algorithms based on optimal polynomial filtering become inefficient in a highly dynamic network, 
whose topology changes very frequently. 

6 Related work 

Several works have studied the convergence rate of distributed consensus algorithms. In particular, 
the authors in [5] and [9, 10] have shown that the convergence rate depends on the second largest 
eigenvalue of the network weight matrix, for fixed and random networks, respectively. They both 
use semi-definite programs to compute the optimal weight matrix, and the optimal topology. 

Other works have addressed the consensus problem, and we mention here only the most relevant 
ones. A. Olshevsky and J. N. Tsitsiklis in [8] propose two consensus algorithms for fixed network 
topologies, which build on the "agreement algorithm". The proposed algorithms make use of 
spanning trees and the authors bound their worst-case convergence rate. For dynamic network 
topologies, they propose an algorithm which builds on a previously known distributed load balancing 
algorithm. In this case, the authors show that the algorithm has a polynomial bound on the 
convergence time (e-convergence). 

The authors in [25] study the convergence properties of agreement over random networks fol- 
lowing the Erdds and Renyi random graph model. According to this model, each edge of the graph 
exists with probability p, independently of other edges and the value of p is the same for all edges. 
By agreement, we consider the case where all nodes of the graph agree on a particular value. The 
authors employ results from stochastic stability in order to establish convergence of agreement over 
random networks. Also, it is shown that the rate of convergence is governed by the expectation of 
an exponential factor, which involves the second smallest eigenvalue of the Laplacian of the graph. 

Gossip algorithms have also been applied successfully to solving distributed averaging prob- 
lems. In [15] provide convergence results on randomized gossip algorithm in both synchronous and 
asynchronous settings. Based on the obtained results, they optimize the network topology (edge 
formation probabilities) in order to maximize the convergence rate of randomized gossip. This 
optimization problem is also formulated as a semi-definite program (SDP). In a recent study, the 
authors in [20] have been able to improve the standard gossip protocols in cases where the sensors 
know their geometric positions. The main idea is to exploit geographic routing in order to aggregate 
values among random nodes that are far away in the network. 

Under the same assumption of knowing the geometric positions of the sensors, the authors in 
[21] propose a fast consensus algorithm for geographic random graphs. In particular, they utilize 
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location information of the sensors in order to construct a nonreversible lifted Markov chain that 
mixes faster than corresponding reversible chains. The main idea of lifting is to distinguish the 
graph nodes from the states of the Markov chain and to "split" the states into virtual states that 
are connected in such a way that permits faster mixing. The lifted graph is then "projected" back 
to the original graph, where the dynamics of the lifted Markov chain are simulated subject to the 
original graph topology. However, the proposed algorithm is not applicable in the case where the 
nodes' geographic location is not available. 

In [22] the authors propose a cluster-based distributed averaging algorithm, applicable to both 
fixed linear iteration and random gossiping. The induced overlay graph that is constructed by 
clustering the nodes is better connected relatively to the original graph; hence, the random walk on 
the overlay graph mixes faster than the corresponding walk on the original graph. Along the same 
lines, K. Jung et al. in [23], have used nonreversible lifted Markov chains to accelerate consensus. 
They use the lifting scheme of [24] and they propose a deterministic gossip algorithm based on a 
set of disjoint maximal matchings, in order to simulate the dynamics of the lifted Markov chain. 

Finally, even if we have mostly considered synchronous algorithms in this paper, it is worth 
mentioning that the authors in [26] propose two asynchronous algorithms for distributed averaging. 
The first algorithm is based on blocking (that is, when two nodes update their values they block 
until the update has been completed) and the other algorithm drops the blocking assumption. 
The authors show the convergence of both algorithms under very general asynchronous timing as- 
sumptions. Moreover, the authors in [27] propose consensus propagation, which is an asynchronous 
distributed protocol that is a special case of belief propagation. In the case of singly-connected 
graphs (i.e., connected with no loops), synchronous consensus propagation converges in a number 
of iterations that is equal to the diameter of the graph. The authors provide convergence analysis 
for regular graphs. 

7 Conclusions 

In this paper, we proposed a polynomial filtering methodology in order to accelerate distributed av- 
erage consensus in both fixed and random network topologies. The main idea of polynomial filtering 
is to shape the spectrum of the polynomial weight matrix in order to minimize its second largest 
eigenvalue and subsequently increase the convergence rate. We have constructed semi-definite 
programs to compute the optimal polynomial coefficients in both static and dynamic networks. 
Simulation results with several common weight matrices have shown that the convergence rate is 
much higher than for state-of-the-art algorithms in most scenarios, except in the specific case of 
highly dynamic networks and small memory sensors. 
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